Parsing German with Latent Variable Grammars
نویسندگان
چکیده
We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any language specific or linguistically constrained features. Nonetheless, the resulting grammars encode many linguistically interpretable patterns and give the best published parsing accuracies on three German treebanks.
منابع مشابه
Products of Random Latent Variable Grammars
We show that the automatically induced latent variable grammars of Petrov et al. (2006) vary widely in their underlying representations, depending on their EM initialization point. We use this to our advantage, combining multiple automatically learned grammars into an unweighted product model, which gives significantly improved performance over state-ofthe-art individual grammars. In our model,...
متن کاملSparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees, learned by iteratively splitting grammar productions (not categories). Different regions of the grammar are refined to different degrees, yielding grammars which are three orders of magnitude smaller tha...
متن کاملLatent-Variable PCFGs: Background and Applications
Latent-variable probabilistic context-free grammars are latent-variable models that are based on context-free grammars. Nonterminals are associated with latent states that provide contextual information during the top-down rewriting process of the grammar. We survey a few of the techniques used to estimate such grammars and to parse text with them. We also give an overview of what the latent st...
متن کاملParsing German Topological Fields with Probabilistic Context-Free Grammars
Parsing German Topological Fields with Probabilistic Context-Free Grammars Jackie Chi Kit Cheung M. Sc. Graduate Department of Computer Science University of Toronto 2009 Syntactic analysis is useful for many natural language processing applications requiring further semantic analysis. Recent research in statistical parsing has produced a number of highperformance parsers using probabilistic co...
متن کاملGenerative and Discriminative Latent Variable Grammars
Latent variable grammars take an observed (coarse) treebank and induce more fine-grained grammar categories, that are better suited for modeling the syntax of natural languages. Estimation can be done in a generative or a discriminative framework, and results in the best published parsing accuracies over a wide range of syntactically divergent languages and domains. In this paper we highlight t...
متن کامل